Department of Quantitative Social Science Reducing bias due to missing values of the response variable by joint modeling with an auxiliary variable
نویسندگان
چکیده
In this paper, we consider the problem of missing values of a continuous response variable that cannot be assumed to be missing at random. The example considered here is an analysis of pupil’s subjective engagement at school using longitudinal survey data, where the engagement score from wave 3 of the survey is missing due to a combination of attrition and item non-response. If less engaged students are more likely to drop out and less likely to respond to questions regarding their engagement, then missingness is not ignorable and can lead to inconsistent estimates. We suggest alleviating this problem by modelling the response variable jointly with an auxiliary variable that is correlated with the response variable and not subject to nonresponse. Such auxiliary variables can be found in administrative data, in our example, the National Pupil Database containing test scores from national achievement tests. We estimate a joint model for engagement and achievement to reduce the bias due to missing values of engagement. A Monte Carlo study is performed to compare our proposed multivariate response approach with alternative approaches such as the Heckman selection model and inverse probability of selection weighting. JEL classification: C13, C33, I21.
منابع مشابه
Generalized Family of Estimators for Imputing Scrambled Responses
When there is a high correlation between the study and the auxiliary variables, the rank of the auxiliary variable also correlates with the study variable. Then, the use of the rank as an additional auxiliary variable may be helpful to increase the efficiency of the estimator of the mean or total of the population. In the present study, we propose two generalized familie...
متن کاملتحلیل درستنمایی ماکزیمم مدل رگرسیون لجستیک در حالتی که داده های متغیرهای پیشگو کامل نیستند ولی متغیرهای کمکی وجود دارند
Background and Objectives: Missing data exist in many studies, e.g. in regression models, and they decrease the model's efficacy. Many methods have been suggested for handling incomplete data: they have generally focused on missing outcome values. But covariate values can also be missing.Materials and Methods: In this paper we study the missing imputation by the EM algorithm and auxiliary varia...
متن کاملBinary Regression With a Misclassified Response Variable in Diabetes Data
Objectives: The categorical data analysis is very important in statistics and medical sciences. When the binary response variable is misclassified, the results of fitting the model will be biased in estimating adjusted odds ratios. The present study aimed to use a method to detect and correct misclassification error in the response variable of Type 2 Diabetes Mellitus (T2DM), applying binary ...
متن کاملCorrection of bias from non-random missing longitudinal data using auxiliary information.
Missing data are common in longitudinal studies due to drop-out, loss to follow-up, and death. Likelihood-based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to...
متن کاملSpatial Regression in the Presence of Misaligned data
In this paper, four approaches are presented to the problem of fitting a linear regression model in the presence of spatially misaligned data. These approaches are plug-in method, simulation, regression calibration and maximum likelihood. In the first two approaches, with modeling the correlation between the explanatory variable, prediction of explanatory variable is determined at sites...
متن کامل